17 research outputs found

    スケッチ問い合わせを用いた文書画像内容検索

    Get PDF
    この博士論文は、全文公表に適さないやむを得ない事由があり要約のみを公表していましたが、解消したため、令和2(2020)年4月20日に全文を公表しました。筑波大学 (University of Tsukuba)201

    A novel shape descriptor based on salient keypoints detection for binary image matching and retrieval

    Get PDF
    We introduce a shape descriptor that extracts keypoints from binary images and automatically detects the salient ones among them. The proposed descriptor operates as follows: First, the contours of the image are detected and an image transformation is used to generate background information. Next, pixels of the transformed image that have specific characteristics in their local areas are used to extract keypoints. Afterwards, the most salient keypoints are automatically detected by filtering out redundant and sensitive ones. Finally, a feature vector is calculated for each keypoint by using the distribution of contour points in its local area. The proposed descriptor is evaluated using public datasets of silhouette images, handwritten math expressions, hand-drawn diagram sketches, and noisy scanned logos. Experimental results show that the proposed descriptor compares strongly against state of the art methods, and that it is reliable when applied on challenging images such as fluctuated handwriting and noisy scanned images. Furthermore, we integrate our descripto

    Educational video classification by using a transcript to image transform and supervised learning

    Get PDF
    In this work, we present a method for automatic topic classification of educational videos using a speech transcript transform. Our method works as follows: First, speech recognition is used to generate video transcripts. Then, the transcripts are converted into images using a statistical co-occurrence transformation that we designed. Finally, a classifier is used to produce video category labels for a transcript image input. For our classifiers, we report results using a convolutional neural network (CNN) and a principal component analysis (PCA) model. In order to evaluate our method, we used the Khan Academy on a Stick dataset that contains 2,545 videos, where each video is labeled with one or two of 13 categories. Experiments show that our method is effective and strongly competitive against other supervised learning-based methods

    Sketch-Based Image Retrieval By Size-Adaptive and Noise-Robust Feature Description

    No full text
    We review available methods for Sketch-Based Image Retrieval (SBIR) and we discuss their limitations. Then, we present two SBIR algorithms: The first algorithm extracts shape features by using support regions calculated for each sketch point, and the second algorithm adapts the Shape Context descriptor [1] to make it scale invariant and enhances its performance in presence of noise. Both algorithms share the property of calculating the feature extraction window according to the sketch size. Experiments and comparative evaluation with state-of-the-art methods show that the proposed algorithms are competitive in distinctiveness capability and robust against noise

    A modular approach for query spotting in document images and its optimization using genetic algorithms

    No full text
    Query spotting in document images is a subclass of Content-Based Image Retrieval (CBIR) algorithms concerned with detecting occurrences of a query in a document image. Due to noise and complexity of document images, spotting can be a challenging task and easily prone to false positives and partially incorrect matches, thereby reducing the overall precision of the algorithm. A robust and accurate spotting algorithm is essential to our current research on sketch-based retrieval of digitized lecture materials. We have recently proposed a modular spotting algorithm in [1]. Compared to existing methods, our algorithm is both application-independent and segmentation-free. However, it faces the same challenges of noise and complexity of images. In this paper, inspired by our earlier research on optimizing parameter settings for CBIR using an evolutionary algorithm [2][3], we introduce a Genetic Algorithm-based optimization step in our spotting algorithm to improve each spotting result. Experiments using an image dataset of journal pages reveal promising performance, in that the precision is significantly improved but without compromising the recall of the overall spotting result

    An Application-Independent and Segmentation-Free Approach for Spotting Queries in Document Images

    No full text
    We report our ongoing research on an application-independent and segmentation-free approach for spotting queries in document images. Built on our earlier work reported in [1][2], this paper introduces an image processing approach that finds occurrences of a query, which is a multi-part object, in a document image, through 5 steps: (1) Preprocessing for image normalization and connected components extraction. (2) Feature Extraction from connected components. (3) Matching of the query and document image connected components' feature vectors. (4) Voting for determining candidate occurrences in the document image that are similar to the query. (5) Candidate Filtering for detecting relevant occurrences and filtering out irrelevant patterns. Compared to existing methods, our contributions are twofold: Our approach is designed to deal with any type of queries, without restriction to a particular class such as words or mathematical expressions. Second, it does not apply a domain-specific segmentation to extract regions of interest from the document image, such as text paragraphs or mathematical calculations. Instead, it considers all the image information. Experimental evaluation using scanned journal images show promising performances and possibility of further improvement

    A comparative study using contours and skeletons as shape representations for binary image matching

    No full text
    Contours and skeletons are well-known shape representations that embody visual information by using a limited set of object points. Both representations have been applied in various pattern recognition applications, while studies in cognitive science have investigated their roles in human perception. Despite their importance has been shown in the above-mentioned fields, to our knowledge no existing studies have been conducted to compare their performances. Filling this gap, this paper is an empirical study of these two shape representations by comparing their performances over different binary image categories and variations. The image categories include thick, elongated, and nearly thin images. Image variations include addition of noise to the contours, blurring, and size reduction. The comparative evaluation is achieved by resorting to object classification (OC) and content-based image retrieval (CBIR) algorithms and evaluation metrics. The main findings highlight the superiority of contours but the improvements observed when skeletons are used for images with noisy contours

    Towards a segmentation and recognition-free approach for content-based document image retrieval of handwritten queries

    No full text
    We introduce a method for content-based document image retrieval (CBDIR) of handwritten queries that is both segmentation and recognition-free. We first demonstrate that our method is underpinned by a theoretical model that exploits the Bayes' rule. Next, we present an algorithmic implementation that takes into account real world retrieval challenges caused by handwriting fluctuations and style variations. Our algorithm operates as follows: First, a number of connected components of the query are matched against the connected components of the document image using shape features. A similarity threshold is used to select the connected components of the document image that are most similar to the query components. Then, the selected components are used to detect candidate occurrences of the query in the document image by using size-adaptive bounding boxes. Finally, a score is calculated for each candidate occurrence and used for ranking. We conduct a comparative evaluation of our method on a dataset of 200 printed document images, by executing 40 printed and 200 handwritten queries of mathematical expressions. Experimental results demonstrate competitive performances expressed by P-Recall = 100%, A-Recall = 99.95% for printed queries, and P-Recall = 73.5%, A-Recall = 57.92% for handwritten queries, outperforming a state-of-the-art CBDIR algorithm
    corecore